CrowdTruth for Sparse Multiple Choice Tasks: Event Extraction

In this tutorial, we will apply CrowdTruth metrics to a sparse multiple choice crowdsourcing task for Event Extraction from sentences. The workers were asked to read a sentence and then pick from a multiple choice list which are the words or words phrases in the sentence that are events or actions. The options available in the multiple choice list change with the input sentence. The task was executed on FigureEight. For more crowdsourcing annotation task examples, click here.

In this tutorial, we will also show how to translate an open task to a closed task by processing both the input units and the annotations of a crowdsourcing task, and how this impacts the results of the CrowdTruth quality metrics. We start with an open-ended extraction task, where the crowd was asked to read a sentence and then pick from a multiple choice list which are the words or words phrases in the sentence that are events or actions.

To replicate this experiment, the code used to design and implement this crowdsourcing annotation template is available here: template, css, javascript.

This is a screenshot of the task as it appeared to workers:

A sample dataset for this task is available in this file, containing raw output from the crowd on FigureEight. Download the file and place it in a folder named data that has the same root as this notebook. Now you can check your data:


In [1]:
import pandas as pd

test_data = pd.read_csv("../data/event-text-sparse-multiple-choice.csv")
test_data.head()


Out[1]:
_unit_id _created_at _id _started_at _tainted _channel _trust _worker_id _country _region ... events_count original_sentence processed_sentence selectedtags_desc_gold sentence sentence_id stanford_lemmas stanford_pos_tags tokens validate_verbs
0 1883297207 8/31/2018 08:18:12 4019711384 8/31/2018 08:11:44 False clixsense 1 6481150 AUS 8 ... 4 Separately, Esselte Business Systems reported ... Separately , Esselte Business Systems reported... NaN NaN 11 NaN NaN 39 NaN
1 1883297207 8/31/2018 21:59:16 4021335631 8/31/2018 21:58:58 False gifthunterclub 1 43861575 USA NaN ... 4 Separately, Esselte Business Systems reported ... Separately , Esselte Business Systems reported... NaN NaN 11 NaN NaN 39 NaN
2 1883297214 8/30/2018 12:57:05 4016914193 8/30/2018 12:56:45 False clixsense 1 31988217 GBR P5 ... 5 But some other parties and social organization... But some other parties and social organization... NaN NaN 10 NaN NaN 39 NaN
3 1883297214 8/31/2018 12:36:21 4020056124 8/31/2018 12:35:42 False instagc 1 23503585 CAN SK ... 5 But some other parties and social organization... But some other parties and social organization... NaN NaN 10 NaN NaN 39 NaN
4 1883297214 8/30/2018 15:00:13 4017194252 8/30/2018 14:59:21 False prodege 1 11131207 CAN ON ... 5 But some other parties and social organization... But some other parties and social organization... NaN NaN 10 NaN NaN 39 NaN

5 rows × 26 columns

Declaring a pre-processing configuration

The pre-processing configuration defines how to interpret the raw crowdsourcing input. To do this, we need to define a configuration class. First, we import the default CrowdTruth configuration class:


In [2]:
import crowdtruth
from crowdtruth.configuration import DefaultConfig

Our test class inherits the default configuration DefaultConfig, while also declaring some additional attributes that are specific to the Relation Extraction task:

  • inputColumns: list of input columns from the .csv file with the input data
  • outputColumns: list of output columns from the .csv file with the answers from the workers
  • annotation_separator: string that separates between the crowd annotations in outputColumns
  • open_ended_task: boolean variable defining whether the task is open-ended (i.e. the possible crowd annotations are not known beforehand, like in the case of free text input); in the task that we are processing, workers pick the answers from a pre-defined list, therefore the task is not open ended, and this variable is set to False
  • annotation_vector: list of possible crowd answers, mandatory to declare when open_ended_task is False; for our task, this is the list of all relations that were given as input to the crowd in at least one sentence
  • processJudgments: method that defines processing of the raw crowd data; for this task, we process the crowd answers to correspond to the values in annotation_vector

The complete configuration class is declared below:


In [3]:
class TestConfig(DefaultConfig):
    inputColumns = ["doc_id", "events", "events_count", "original_sentence", "processed_sentence", "sentence_id", "tokens"]
    outputColumns = ["selected_events"]
    
    annotation_separator = ","
        
    # processing of a closed task
    open_ended_task = True
    
    def processJudgments(self, judgments):
        # pre-process output to match the values in annotation_vector
        for col in self.outputColumns:
            # transform to lowercase
            judgments[col] = judgments[col].apply(lambda x: str(x).lower())
            # remove square brackets from annotations
            judgments[col] = judgments[col].apply(lambda x: str(x).replace('[',''))
            judgments[col] = judgments[col].apply(lambda x: str(x).replace(']',''))
            # remove the quotes around the annotations
            judgments[col] = judgments[col].apply(lambda x: str(x).replace('"',''))
        return judgments

Pre-processing the input data

After declaring the configuration of our input file, we are ready to pre-process the crowd data:


In [4]:
data_open, config = crowdtruth.load(
    file = "../data/event-text-sparse-multiple-choice.csv",
    config = TestConfig()
)

data_open['judgments'].head()


Out[4]:
output.selected_events output.selected_events.count output.selected_events.unique submitted started worker unit duration job
judgment
4019711384 {u'$ 10.1__129__135': 1, u'reported__38__46': ... 4 4 2018-08-31 08:18:12 2018-08-31 08:11:44 6481150 1883297207 388 ../data/event-text-sparse-multiple-choice
4021335631 {u'$ 10.1__129__135': 1} 1 1 2018-08-31 21:59:16 2018-08-31 21:58:58 43861575 1883297207 18 ../data/event-text-sparse-multiple-choice
4016914193 {u'accession__100__109': 1, u'bring__174__179'... 2 2 2018-08-30 12:57:05 2018-08-30 12:56:45 31988217 1883297214 20 ../data/event-text-sparse-multiple-choice
4020056124 {u'accession__100__109': 1, u'bring__174__179'... 2 2 2018-08-31 12:36:21 2018-08-31 12:35:42 23503585 1883297214 39 ../data/event-text-sparse-multiple-choice
4017194252 {u'accession__100__109': 1, u'claiming__80__88... 3 3 2018-08-30 15:00:13 2018-08-30 14:59:21 11131207 1883297214 52 ../data/event-text-sparse-multiple-choice

Computing the CrowdTruth metrics

The pre-processed data can then be used to calculate the CrowdTruth metrics:


In [6]:
results_open = crowdtruth.run(data_open, config)

results is a dict object that contains the quality metrics for sentences, events and crowd workers.

The sentence metrics are stored in results["units"]:


In [7]:
results_open["units"].head()


Out[7]:
duration input.doc_id input.events input.events_count input.original_sentence input.processed_sentence input.sentence_id input.tokens job output.selected_events output.selected_events.annotations output.selected_events.unique_annotations worker uqs unit_annotation_score uqs_initial unit_annotation_score_initial
unit
1883297207 80.90 wsj_1033.tml $ 10.1__129__135###reported__38__46###fell__72... 4 Separately, Esselte Business Systems reported ... Separately , Esselte Business Systems reported... 11 39 ../data/event-text-sparse-multiple-choice {u'$ 9.5__86__91': 1, u'reported__38__46': 18,... 35 4 20 0.811172 {u'reported__38__46': 0.946635621845, u'$ 10.1... 0.731037 {u'reported__38__46': 0.9, u'$ 10.1__129__135'...
1883297208 62.70 APW19990607.0041.tml purports__5__13###be__17__19###said__179__183#... 5 Kopp purports to be a devout Roman Catholic, a... Kopp purports to be a devout Roman Catholic , ... 14 39 ../data/event-text-sparse-multiple-choice {u'no_event': 1, u'be__17__19': 4, u'said__179... 50 6 20 0.565027 {u'no_event': 0.0312696271625, u'be__17__19': ... 0.471512 {u'no_event': 0.05, u'be__17__19': 0.2, u'said...
1883297209 49.65 NYT19981025.0216.tml protect__45__52###murdered__77__85###said__97_... 7 ``We as Christians have a responsibility to pr... `` We as Christians have a responsibility to p... 14 39 ../data/event-text-sparse-multiple-choice {u'want__177__181': 4, u'have__20__24': 4, u'm... 63 6 20 0.649039 {u'want__177__181': 0.246818574648, u'have__20... 0.561513 {u'want__177__181': 0.2, u'have__20__24': 0.2,...
1883297210 63.65 NYT19981026.0446.tml opposed__170__177###followed__122__130###was__... 5 Slepian's death was among the first topics rai... Slepian 's death was among the first topics ra... 16 39 ../data/event-text-sparse-multiple-choice {u'opposed__170__177': 8, u'followed__122__130... 48 5 20 0.614187 {u'opposed__170__177': 0.52740577667, u'follow... 0.518003 {u'opposed__170__177': 0.4, u'followed__122__1...
1883297211 39.60 NYT19981026.0446.tml exploit__109__116###murder__133__139###said__2... 5 ``It's possible that New York politics has nev... `` It 's possible that New York politics has n... 43 39 ../data/event-text-sparse-multiple-choice {u'exploit__109__116': 12, u'murder__133__139'... 52 5 20 0.653157 {u'exploit__109__116': 0.705336530504, u'murde... 0.589756 {u'exploit__109__116': 0.6, u'murder__133__139...

The uqs column in results["units"] contains the sentence quality scores, capturing the overall workers agreement over each sentence. Here we plot its histogram:


In [8]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.hist(results_open["units"]["uqs"])
plt.xlabel("Sentence Quality Score")
plt.ylabel("Sentences")


Out[8]:
Text(0,0.5,u'Sentences')

The unit_annotation_score column in results["units"] contains the sentence-relation scores, capturing the likelihood that a relation is expressed in a sentence. For each sentence, we store a dictionary mapping each relation to its sentence-relation score.


In [9]:
results_open["units"]["unit_annotation_score"].head(10)


Out[9]:
unit
1883297207    {u'reported__38__46': 0.946635621845, u'$ 10.1...
1883297208    {u'no_event': 0.0312696271625, u'be__17__19': ...
1883297209    {u'want__177__181': 0.246818574648, u'have__20...
1883297210    {u'opposed__170__177': 0.52740577667, u'follow...
1883297211    {u'exploit__109__116': 0.705336530504, u'murde...
1883297212    {u'returned__139__147': 0.849240421016, u'shot...
1883297213    {u'murder__80__86': 0.720750633636, u'curious_...
1883297214    {u'claiming__80__88': 0.700462062784, u'no_eve...
1883297215    {u'wars__214__218': 0.766478442225, u'stabiliz...
1883297216    {u'buildup__18__25': 0.592988447005, u'thrust_...
Name: unit_annotation_score, dtype: object

The worker metrics are stored in results["workers"]:


In [10]:
results_open["workers"].head()


Out[10]:
duration job judgment unit wqs wwa wsa wqs_initial wwa_initial wsa_initial
worker
1883983 34.500000 1 6 6 0.794872 0.820860 0.968341 0.725088 0.756548 0.958417
3587109 11.000000 1 2 2 0.627709 0.749908 0.837048 0.513732 0.648339 0.792383
4316379 24.000000 1 3 3 0.514893 0.688889 0.747424 0.383277 0.559880 0.684569
6377879 64.666667 1 6 6 0.573349 0.695490 0.824381 0.498032 0.620033 0.803234
6481150 98.047619 1 42 42 0.728838 0.776141 0.939054 0.662856 0.710559 0.932866

The wqs columns in results["workers"] contains the worker quality scores, capturing the overall agreement between one worker and all the other workers.


In [27]:
plt.hist(results_open["workers"]["wqs"])
plt.xlabel("Worker Quality Score")
plt.ylabel("Workers")


Out[27]:
Text(0,0.5,u'Workers')

Open to Closed Task Transformation

The goal of this crowdsourcing task is to understand how clearly a word or a word phrase is expressing an event or an action across all the sentences in the dataset and not at the level of a single sentence as previously. Therefore, in the remainder of this tutorial we show how to translate an open task to a closed task by processing both the input units and the annotations of a crowdsourcing task.

The answers from the crowd are stored in the selected_events column.


In [28]:
test_data["selected_events"][0:30]


Out[28]:
0     ["$ 10.1__129__135","reported__38__46","fell__...
1                                  ["$ 10.1__129__135"]
2             ["accession__100__109","bring__174__179"]
3             ["accession__100__109","bring__174__179"]
4     ["accession__100__109","claiming__80__88","bri...
5     ["accession__100__109","claiming__80__88","bri...
6     ["accession__100__109","claiming__80__88","bri...
7            ["accession__100__109","claiming__80__88"]
8            ["accession__100__109","claiming__80__88"]
9            ["accession__100__109","claiming__80__88"]
10           ["accession__100__109","claiming__80__88"]
11           ["accession__100__109","claiming__80__88"]
12                              ["accession__100__109"]
13                              ["accession__100__109"]
14             ["analyzed__82__90","distort__204__211"]
15    ["analyzed__82__90","Replied__0__7","distort__...
16                                 ["analyzed__82__90"]
17                                 ["analyzed__82__90"]
18                                 ["analyzed__82__90"]
19                                 ["analyzed__82__90"]
20    ["announced__7__16","closed__52__58","request_...
21    ["announced__7__16","closed__52__58","request_...
22    ["announced__7__16","closed__52__58","request_...
23    ["announced__7__16","closed__52__58","request_...
24    ["announced__7__16","closed__52__58","request_...
25    ["announced__7__16","closed__52__58","request_...
26    ["announced__7__16","closed__52__58","request_...
27    ["announced__7__16","closed__52__58","request_...
28                ["announced__7__16","closed__52__58"]
29               ["announced__7__16","request__27__34"]
Name: selected_events, dtype: object

As you already know, each word can be expressed in a canonical form, i.e., as a lemma. For example, the words: run, runs, running, they all have the lemma run. As you can see in the previous cell, events in text can appear under multiple forms. To evaluate the clarity of each event, we will process both the input units and the crowd annotations to refer to a word in its canonical form, i.e., we will lemmatize them.

Following, we define the function used to lemmatize the options that are shown to the workers in the crowdsourcing task:


In [29]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

def nltk2wn_tag(nltk_tag):
    if nltk_tag.startswith('J'):
        return wordnet.ADJ
    elif nltk_tag.startswith('V'):
        return wordnet.VERB
    elif nltk_tag.startswith('N'):
        return wordnet.NOUN
    elif nltk_tag.startswith('R'):
        return wordnet.ADV
    else:          
        return None
    
def lemmatize_events(event):
    nltk_tagged = nltk.pos_tag(nltk.word_tokenize(str(event.lower().split("__")[0])))  
    wn_tagged = map(lambda x: (str(x[0]), nltk2wn_tag(x[1])), nltk_tagged)
    res_words = []
                
    for word, tag in wn_tagged:
        if tag is None:            
            res_word = wordnet._morphy(str(word), wordnet.NOUN)
            if res_word == []:
                res_words.append(str(word))
            else:
                if len(res_word) == 1:
                    res_words.append(str(res_word[0]))
                else:
                    res_words.append(str(res_word[1]))
        else:
            res_word = wordnet._morphy(str(word), tag)
            if res_word == []:
                res_words.append(str(word))
            else: 
                if len(res_word) == 1:
                    res_words.append(str(res_word[0]))
                else:
                    res_words.append(str(res_word[1]))
    
    lematized_keyword = " ".join(res_words)
    return lematized_keyword


[nltk_data] Downloading package punkt to /Users/oanainel/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/oanainel/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/oanainel/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!

In [9]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

def nltk2wn_tag(nltk_tag):
    if nltk_tag.startswith('J'):
        return wordnet.ADJ
    elif nltk_tag.startswith('V'):
        return wordnet.VERB
    elif nltk_tag.startswith('N'):
        return wordnet.NOUN
    elif nltk_tag.startswith('R'):
        return wordnet.ADV
    else:          
        return None


[nltk_data] Downloading package punkt to /Users/oanainel/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/oanainel/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/oanainel/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!

The following functions create the values of the annotation vector and extracts the lemma of the events selected by each worker.


In [30]:
def define_annotation_vector(eventsList):  
    events = []
    for i in range(len(eventsList)):
        currentEvents = eventsList[i].split("###")
        
        for j in range(len(currentEvents)):
            if currentEvents[j] != "no_event":
                lematized_keyword = lemmatize_events(currentEvents[j])
                
                if lematized_keyword not in events:
                    events.append(lematized_keyword)
    events.append("no_event")   
    return events

def lemmatize_keywords(keywords, separator):
    keywords_list = keywords.split(separator)
    lematized_keywords = []
    
    for keyword in keywords_list:
        lematized_keyword = lemmatize_events(keyword)
        lematized_keywords.append(lematized_keyword)
    
    return separator.join(lematized_keywords)

In [31]:
class TestConfig(DefaultConfig):
    inputColumns = ["doc_id", "events", "events_count", "original_sentence", "processed_sentence", "sentence_id", "tokens"]
    outputColumns = ["selected_events"]
    
    annotation_separator = ","
        
    # processing of a closed task
    open_ended_task = False
    annotation_vector = define_annotation_vector(test_data["events"])
    
    def processJudgments(self, judgments):
        # pre-process output to match the values in annotation_vector
        for col in self.outputColumns:
            # transform to lowercase
            judgments[col] = judgments[col].apply(lambda x: str(x).lower())
            # remove square brackets from annotations
            judgments[col] = judgments[col].apply(lambda x: str(x).replace("[",""))
            judgments[col] = judgments[col].apply(lambda x: str(x).replace("]",""))
            # remove the quotes around the annotations
            judgments[col] = judgments[col].apply(lambda x: str(x).replace('"',''))
            judgments[col] = judgments[col].apply(lambda x: lemmatize_keywords(str(x), self.annotation_separator))
        return judgments

In [32]:
data_closed, config = crowdtruth.load(
    file = "data/event-text-sparse-multiple-choice.csv",
    config = TestConfig()
)

data_closed['judgments'].head()


Out[32]:
output.selected_events output.selected_events.count output.selected_events.unique submitted started worker unit duration job
judgment
4019711384 {u'$ 10.1': 1, u'report': 1, u'fall': 1, u'$ 9... 4 143 2018-08-31 08:18:12 2018-08-31 08:11:44 6481150 1883297207 388 data/event-text-sparse-multiple-choice
4021335631 {u'$ 10.1': 1, u'report': 0, u'fall': 0, u'$ 9... 1 143 2018-08-31 21:59:16 2018-08-31 21:58:58 43861575 1883297207 18 data/event-text-sparse-multiple-choice
4016914193 {u'accession': 1, u'bring': 1, u'$ 10.1': 0, u... 2 143 2018-08-30 12:57:05 2018-08-30 12:56:45 31988217 1883297214 20 data/event-text-sparse-multiple-choice
4020056124 {u'accession': 1, u'bring': 1, u'$ 10.1': 0, u... 2 143 2018-08-31 12:36:21 2018-08-31 12:35:42 23503585 1883297214 39 data/event-text-sparse-multiple-choice
4017194252 {u'accession': 1, u'claim': 1, u'bring': 1, u'... 3 143 2018-08-30 15:00:13 2018-08-30 14:59:21 11131207 1883297214 52 data/event-text-sparse-multiple-choice

In [36]:
results_closed = crowdtruth.run(data_closed, config)

In [37]:
results_closed["annotations"]


Out[37]:
output.selected_events aqs aqs_initial
$ 10.1 840 3.819172e-02 5.263158e-02
$ 9.5 840 1.000000e-08 1.000000e-08
accession 840 8.321034e-01 6.842105e-01
add 840 1.936832e-01 2.105263e-01
analyze 840 8.530456e-01 7.894737e-01
announce 840 8.670445e-01 7.368421e-01
appear 840 2.689285e-01 2.105263e-01
approve 840 8.577278e-01 7.894737e-01
arrest 840 8.741370e-01 7.894737e-01
assassination 840 6.594177e-01 5.789474e-01
assume 840 4.833141e-01 4.210526e-01
attempt 840 6.151563e-01 4.824561e-01
barricade 840 6.589291e-01 5.789474e-01
be 840 3.914025e-01 3.840941e-01
become 840 4.191707e-01 3.932584e-01
believe 840 4.162580e-01 3.157895e-01
block 840 4.960013e-01 3.684211e-01
bogged 840 2.190501e-01 1.578947e-01
boost 840 7.905818e-01 7.368421e-01
bring 840 4.243983e-01 3.157895e-01
buildup 840 5.694491e-01 4.736842e-01
bury 840 6.571132e-01 5.263158e-01
call 840 4.532265e-01 3.684211e-01
camped 840 8.497850e-01 7.894737e-01
casualty 840 4.454066e-01 3.684211e-01
cause 840 3.774561e-01 2.631579e-01
change 840 8.329362e-01 7.894737e-01
claim 840 6.825544e-01 6.315789e-01
close 840 8.306430e-01 6.997085e-01
come 840 2.111922e-01 1.578947e-01
... ... ... ...
say 840 6.100785e-01 5.540057e-01
see 840 5.103717e-01 4.210526e-01
sent 840 6.743174e-01 5.263158e-01
settle 840 8.276415e-01 7.368421e-01
shot 840 9.240686e-01 8.421053e-01
stabilize 840 6.390495e-01 5.789474e-01
statement 840 1.440308e-01 1.052632e-01
suffer 840 6.074152e-01 5.263158e-01
suit 840 2.744135e-01 2.105263e-01
support 840 5.642443e-01 5.263158e-01
take 840 5.161874e-01 4.210526e-01
talk 840 7.083928e-01 5.789474e-01
target 840 2.147324e-01 2.631579e-01
thrust 840 6.053758e-01 5.263158e-01
tighten 840 5.397438e-01 4.736842e-01
told 840 4.597396e-01 3.684211e-01
transaction 840 6.720130e-01 5.789474e-01
treatment 840 4.764945e-01 4.394904e-01
truth 840 1.000000e-08 1.000000e-08
tumult 840 4.540955e-01 3.684211e-01
upheld 840 9.323898e-01 8.421053e-01
use 840 2.653169e-01 2.105263e-01
vindicate 840 5.528458e-01 4.736842e-01
want 840 1.927458e-01 1.578947e-01
war 840 7.589925e-01 6.842105e-01
willingness 840 1.410162e-01 1.052632e-01
withdraw 840 8.759607e-01 8.421053e-01
withstood 840 2.353197e-01 1.578947e-01
work 840 6.638403e-01 6.315789e-01
write 840 6.304227e-01 4.736842e-01

143 rows × 3 columns

Effect on CrowdTruth metrics

Finally, we can compare the effect of the transformation from an open task to a closed task on the CrowdTruth sentence quality score.


In [39]:
%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt

plt.scatter(
    results["units"]["uqs"],
    results_closed["units"]["uqs"],
)
plt.plot([0, 1], [0, 1], 'red', linewidth=1)
plt.title("Sentence Quality Score")
plt.xlabel("open task")
plt.ylabel("closed task")


Out[39]:
Text(0,0.5,u'closed task')

In [41]:
plt.scatter(
    results["workers"]["wqs"],
    results_closed["workers"]["wqs"],
)
plt.plot([0, 1], [0, 1], 'red', linewidth=1)
plt.title("Worker Quality Score")
plt.xlabel("open task")
plt.ylabel("closed task")


Out[41]:
Text(0,0.5,u'closed task')